3 research outputs found

    Flexible binaural resynthesis of room impulse responses for augmented reality research

    Get PDF
    International audienceA basic building block in audio for Augmented Reality (AR) is the use of virtual sound sources layered on top of any real sources present in an environment. In order to perceive these virtual sources as belonging to the natural scene it is important to match their acoustic parameters to those of a real source with the same characteristics, i.e. radiation properties, sound propagation and head-related impulse response (HRIR). However, it is still unclear to what extent these parameters need to be matched in order to generate plausible scenes in which virtual sound sources blend seamlessly with real sound sources. This contribution presents an auralization framework that allows protyping of augmented reality scenarios from measured multichannel room impulse responses to get a better understanding of the relevance of individual acoustic parameters.A well-established approach for binaural measurement and reproduction of sound scenes is based on capturing binaural room impulse responses (BRIR) using a head and torso simulator (HATS) and convolving these BRIRs dynamically with audio content according to the listener head orientation. However, such measurements are laborious and time consuming, requiring measuring the scene with the HATS in multiple orientations. Additionally, the HATS HRIR is inherently encoded in the BRIRs, making them unsuitable for personalization for different listeners. The approach presented here consists of the resynthesis and dynamic binaural reproduction of multichannel room impulse responses (RIR) using an arbitrary HRIR dataset. Using a compact microphone array, we obtained a pressure RIR and a set of auxiliary RIRs, and we applied the Spatial Decomposition Method (SDM) to estimate the direction-of-arrival (DOA) of the different sound events in the RIR. The DOA information was used to map sound pressure to different locations by means of an HRIR dataset, generating a binaural room impulse response (BRIR) for a specific orientation. By either rotating the DOA or the HRIR data set, BRIRs for any direction may be obtained. Auralizations using SDM are known to whiten the spectrum of late reverberation. Available alternatives such as time-frequency equalization were not feasible in this case, as a different time-frequency filter would be necessary for each direction, resulting in a non-homogeneous equalization of the BRIRs. Instead, the resynthesized BRIRs were decomposed into sub-bands and the decay slope of each sub-band was modified independently to match the reverberation time of the original pressure RIR. In this way we could apply the same reverberation correction factor to all BRIRs. In addition, we used a direction independent equalization to correct for timbral effects of equipment, HRIR, and signal processing. Real-time reproduction was achieved by means of a custom Max/MSP patch, in which the direct sound, early reflections and late reverberation were convolved separately to allow real-time changes in the time-energy properties of the BRIRs. The mixing time of the reproduced BRIRs is configurable and a single direction independent reverberation tail is used. To evaluate the quality of the resynthesis method in a real room, we conducted both objective and perceptual comparisons for a variety of source positions. The objective analysis was performed by comparing real measurements of a KEMAR mannequin with the resynthesis at the same receiver location using a simulated KEMAR HRIR. Typical room acoustic parameters of both real and resynthsized acoustics were found to be in good agreement. The perceptual validation consisted of a comparison of a loudspeaker and its resynthesized counterpart. Non-occluding headphones with individual equalization were used to ensure that listeners were able to simultaneously listen to the real and the virtual samples. Subjects were allowed to listen to the sounds for as long as they desired and freely switch between the real and virtual stimuli in real time. The integration of an Optitrack motion tracking system allowed us to present world-locked audio, accounting for head rotations.We present here the results of this listening test (N = 14) with three sections: discrimination, identification, and qualitative ratings. Preliminary analysis revealed that in these conditions listeners were generally able to discriminate between real and virtual sources and were able to consistently identify which of the presented sources was real and which was virtual. The qualitative analysis revealed that timbral differences are the most prominent cues for discrimination and identification, while spatial cues are well preserved. All the listeners reported good externalization of the binaural audio.Future work includes extending the presented validation to more environments, as well as implementing tools to arbitrarily modify BRIRs in the spatial, temporal, and frequency domains in order to study the perceptual requirements of room acoustics reproduction in AR

    Auralization systems for simulation of augmented reality experiences in virtual environments

    Get PDF
    Augmented reality has the potential to connect people anywhere, anytime, and provide them with interactive virtual objects that enhance their lives. To deliver contextually appropriate audio for these experiences, a much greater understanding of how users will interact with augmented content and each other is needed. This contribution presents a system for evaluating human behavior and augmented reality device performance in calibrated synthesized environments. The system consists of a spherical loudspeaker array capable of spatial audio reproduction in a noise isolated and acoustically dampened room. The space is equipped with motion capture systems that track listener position, orientation, and eye gaze direction in temporal synchrony with audio playback and capture to allow for interactive control over the acoustic environment. In addition to spatial audio content from the loudspeaker array, supplementary virtual objects can be presented to listeners using motion-tracked unoccluding headphones. The system facilitates a wide array of studies relating to augmented reality research including communication ecology, spatial hearing, room acoustics, and device performance. System applications and configuration, calibration, processing, and validation routines are presented

    Perceptual Evaluation of Approaches for Binaural Reproduction of Non-Spherical Microphone Array Signals

    No full text
    Microphone arrays consisting of sensors mounted on the surface of a rigid, spherical scatterer are popular tools for the capture and binaural reproduction of spatial sound scenes. However, microphone arrays with a perfectly spherical body and uniformly distributed microphones are often impractical for the consumer sector, in which microphone arrays are generally mounted on mobile and wearable devices of arbitrary geometries. Therefore, the binaural reproduction of sound fields captured with arbitrarily shaped microphone arrays has become an important field of research. In this work, we present a comparison of methods for the binaural reproduction of sound fields captured with non-spherical microphone arrays. First, we evaluated equatorial microphone arrays (EMAs), where the microphones are distributed on an equatorial contour of a rigid, spherical 1. Second, we evaluated a microphone array with six microphones mounted on a pair of glasses. Using these two arrays, we conducted two listening experiments comparing four rendering methods based on acoustic scenes captured in different rooms2. The evaluation includes a microphone-based stereo approach (sAB stereo), a beamforming-based stereo approach (sXY stereo), beamforming-based binaural reproduction (BFBR), and BFBR with binaural signal matching (BSM). Additionally, the perceptual evaluation included binaural Ambisonics renderings, which were based on measurements with spherical microphone arrays. In the EMA experiment we included a fourth-order Ambisonics rendering, while in the glasses array experiment we included a second-order Ambisonics rendering. In both listening experiments in which participants compared all approaches with a dummy head recording we applied non-head-tracked binaural synthesis, with sound sources only in the horizontal plane. The perceived differences were rated separately for the attributes timbre and spaciousness. Results suggest that most approaches perform similarly to the Ambisonics rendering. Overall, BSM, and microphone-based stereo were rated the best for EMAs, and BFBR and microphone-based stereo for the glasses array
    corecore